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Summary 

Normal mode analysis offers an efficient way of modeling the conformational flexibility of protein 
structures. Simple models defined by contact topology, known as elastic network models, have been 
used to model a variety of systems, but the validation is typically limited to individual modes for a 
single protein. We use anisotropic displacement parameters from crystallography to test the quality 
of prediction of both the magnitude and directionality of conformational variance. Normal modes 
from four simple elastic network model potentials and from the CHARMM forcefield are calculated 
for a data set of 83 diverse, ultrahigh resolution crystal structures. While all five potentials provide 
good predictions of the magnitude of flexibility, the methods that consider all atoms have a clear 
edge at prediction of directionality, and the CHARMM potential produces the best agreement. The 
low-frequency modes from different potentials are similar, but those computed from the CHARMM 
potential show the greatest difference from the elastic network models. This was illustrated by 
computing the dynamic correlation matrices from different potentials for a PDZ domain structure. 
Comparison of normal mode results with anisotropic temperature factors opens the possibility of 
using ultrahigh resolution crystallographic data as a quantitative measure of molecular flexibility. 
The comprehensive evaluation demonstrates the costs and benefits of using normal mode potentials 
of varying complexity. Comparison of the dynamic correlation matrices suggests that a combination 
of topological and chemical potentials may help identify residues in which chemical forces make 
large contributions to intramolecular coupling. 



Abbreviations: ENM, elastic network model; ANM, anisotropic network model; DNM, distance net- 
work model; BNM, block normal modes; E1N, EINemo; HCA, harmonic Ca potential; ADP, anisotropic 
displacement parameter 
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Introduction 



The native state of a protein is an ensemble of conformers, deviating to some extent from the average 
coordinates reported as the experimental structure. Knowledge of the static structure is not sufficient for 
understanding the functional mechanisms, which often depend on the flexibility of protein structures. Exper- 
imental observation of conformational motion of biomolecules is becoming possible, thanks to experimental 
innovation, but remains a formidable challenge. Crystals can be subjected to time-resolved experiments 
dMoffat, 2001) ), but the range of applications is limited to reactions that can be triggered by light or trapped 
by clever manipulations. NMR spectroscopy can be used to determine both the structure and the dynamics 
of proteins ( jLindorff-Larsen et al., 2005D , but it is limited both by the maximum size of protein structures 
and by the difficulty of discrimination of slowly or quickly exchanging dynamics ( |Palmer et al., 2001[ ) . Mass 
spectrometry coupled with hydrogen/deuterium exchange and proteolysis has been used to determine changes 
in the relative solvent accessibility of amide hydrogens (Lanman & Prevelige, 2004), and single- molecule ex- 
periments using optical trapping have resulted in spectacular observations of the motion of motor proteins 
( jAbbo ndanzic rT et al., 2005[ ). In general, direct measurement of molecular motion remains laborious and 
limited. 

Computer simulations of biological macromolecules enable detailed explorations of the conformational 
ensemble near the native state (Karplus & Kuriyan, 2005 ). However, the computational cost of molecular dy- 
namics with all-atom forcefields limits the accessible timescale of simulations, particularly of large molecular 
assemblies. Thus, approximate methods, such as normal mode analysis (NMA), are often used to efficiently 
describe the allowed conformational ensemble of protein structures ( |Go et al, 1983; B rooks fc Karplus, 1983| 
|Levitt et al., 1985| . The decomposition into modes with different frequencies reduces the dimensional- 
ity of the problem, since a few lowest-frequency modes describe the most dominant directions of motion 
(Teodoro et al., 2003). These global modes have been used to predict protein flexibility (Cui et al., 2004) and 
to study the mechanism of conformational transitions necessary for protein function ( |Ma fc Karplus, 1997[ ). 
Simple coarse-grained potentials, such as Elastic Network Models, provide an efficient description of a pro- 
tein structure by connecting atoms or residues within a certain distance with identical harmonic potentials 
( |Tirion, 19 96). Despite the extreme simplification, these models capture the basic topology of a structure 
and generate predictions on the flexibility and preferred modes of motion of proteins that are in general 
agreement with experimental data ( |Bahar fc Rader,~2 005). 

The study of protein conformational dynamics requires an interplay between experiment and compu- 
tation. A readily available measure of conformational mobility is the Debye- Waller temperature factor, 
or B-factor, which models the variance in atomic position from the scattering data. It has been used as 
a source of information on protein flexibility for decades (Fraucnfelder et al., 1979), and as computational 
methodologies have matured, studies over large numbers of crystal structures have shown good agreement 
with computations, specifically with ENM results ( |Kundu et al., 20021 . While the classic B-factor has long 
been a routine parameter in protein structure refinement, until recently few crystal data sets contained suffi- 
ciently many observations (unique reflections) to allow determination of anisotropic displacement parameters 
(ADPs). These parameters model the probability distribution of atomic positions as a Gaussian function 
with ellipsoidal contours, and have been shown to significantly improve the refinement statistics for crys- 
tal structures of biological macromolecules QLonghi et al, 199~7| |Dauter et al., 199"7] |Esposito et al, 2000[ ) at 
resolution better than 1.2 A. ADPs have been used in a few studies as a qualitative indicator of the direction- 
ality of prevalent motion in a protein structure ( |Wilson fc Brunger, 2000[ ), but this source of experimental 
information has not been systematically exploited. 

The proliferation of various simple ENM-like models for macromolecular fluctuation begs the question of 
their relative fidelity and reliability, but no systematic comparison of the methods has been undertaken, to 
the best of our knowledge. In a recent study, 98 highest resolution crystal structures in the PDB were used 
for systematic evaluation of prediction of the magnitude of motion in protein structures using an isotropic 
ENM model fKondras hov et al., 2006[ ) . In the present work, we compare the quality of prediction of the 
magnitude and direction of structural variance for the most commonly used anisotropic ENM potentials, 
and introduce a new one to better model different chemical interactions. Comparison between ADPs and 
computational variance matrices allows a quantitative evaluation of the merits and drawbacks of different 
potentials. We also investigate the effect of the choice of potential on the global dynamic properties, such 
as the correlation matrix. 
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Results 



Analysis of crystallographic data 

The present study evaluates predictions of five coarse-grained normal mode potentials using a set of anisotropic 
displacement parameters from ultra- high resolution crystal structures. The Protein Data Bank ( jBerman et al., 2 000) 
was searched for all X-ray crystal structures of proteins with chain length of at least 50 residues, with res- 
olution at or beyond 1 A, with the restriction that the structures have less than 50% sequence identity. 83 
such structures were deposited with anisotropic displacement paremetrs, containing a total of 17763 protein 
residues. Excluding those with disordered Ca atoms or those involved in intermolecular crystal contacts, 
both of which have an effect on the ADP, left 12348 residues with usable ADPs. The anisotropic displacement 
parameters are commonly represented as ellipsoids in crystal structures, as shown in Figure [TJ and contain 
information about both the magnitude and the preferred direction of atomic variation in the crystal. The 
anisotropy of the ellipsoid, defined as the ratio of the smallest to the largest eigenvalue of ellipsoid matrix 



(Trueblood et al., 1996), is a measure of deviation from spherical shape. We separated the structures by 
refinement software used, and found different distributions of anisotropy for the Ca ADPs. 68 structures 
were refined using SHELX (Shcldrick & Sch neider, 1997[ ) , and the remaining were determined 15 using Ref- 



mac from the CCP4 suite (Collaborative Computati onal Project Number 4, 1994[ ). The Ca ADPs in the 



Refmac set had a mean anisotropy of 0.64, compared with 0.51 for the SHELX set (Figure [2]), suggesting 
that the crystallographic restraints used in the two programs have significant effects on resulting ADPs. 
Since sphere-like ellipsoids contain little directional information, a subset of ADPs with anisotropy of less 
than 0.5 was chosen, leaving 4642 ADPs to compare with the computational predictions of directionality of 
variance. 



Normal mode potentials 

Elastic Network Models are dependent on the choice of cutoff distance ( |Tirion, 1996[ ) , which separates atom 
pairs deemed in contact from those which are not interacting. We introduce a new ENM method, called 
distance-based network model (DNM), an elastic network model with multiple force constants for atomic 
contacts, as described in Methods. It is clear that atom pairs closer than 2.3 Aare covalently bound and thus 
have stronger interactions than those which are 5 Aapart. To mimic the chemistry, several discrete distance 
ranges were defined and the force constants for each category are set to be the reciprocal of the total number of 
contacts in this range. Since the number of atomic contacts grows with distance, this ensures that interactions 
between atoms farther away are represented by weaker force constants than those in close proximity. We have 
shown recently that a similar strengthening of force constants between covalently bound residues resulted 
in greatly improved variance prediction quality for an isotropic ENM (Kondras hov et al., 2006[ ), compared 
with the single force constant GNM ( |Bahar et al., 1997| . The atomic interactions are added up with the 
appropriate force constants for each residue, producing a residue-level model based on atomic interactions, 
with no additional free parameters, since the force constants are defined based on the contact matrices. The 
only parameter not defined from the structure is the maximum cutoff distance considered, and we optimized 
it by comparing prediction quality in calculations with a range of cutoffs from 5 A to 11 A. The analysis for 
agreement with magnitudes and directions of ADP ellipsoids with the model predictions is shown in Table 
1 of SI. With the exception of the 5 A cutoff, the results were very similar, and 9 A was selected as the 
optimal cutoff distance. 

We also tested four existing normal mode potentials, three of the ENM variety, and one based on the 
CHARMM forcefield. The EINemo method ( [Suhre & Sanejouand, 2004[ ) depends on the cutoff distance 
between atoms, and the atomic contact matrices are combined into rigid-motion blocks on a residue level. 
We varied the cutoff distance from 5 to 11 A (Table 2 of SI), and found that the best results at 5 A, compared 
with the default value of 8 A. The anisotropic network model (ANM) ( |Atilgan et al, 2001 ) depends on a 
cutoff distance between Ca atoms, and we evaluated the results for a range from 10 to 16 A (see Table 3 
of SI). The variation is also relatively small, but there is an opposite trend between quality of prediction 
of direction and magnitude of motion. The best cutoff for directionality prediction was at 10 A, while 
the best agreement in magnitude was with 16 A cutoff, in contrast to the previously used value of 13 A 
(Atilgan et al., 2001). Normal modes were computed using an atomistic forcefield (CHARMM) with rigid- 
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body blocks for residues, referred to as Block Normal Modes (BNM) ( |Li &; Cui, 20 02). The inclusion of 
non-protein ligands and cofactors in the CHARMM force-field resulted in significant increases in quality of 
prediction, and thus all the non-protein residues for which CHARMM libraries could be found were added 
to the models. The last method used is the Harmonic Ca potential (HCA) with distance-dependent force 
constant flHinsen et al, 2000[), as implemented in Molecular Modeling Toolkit (MMTK) (|Hinsen, 2000]). 



Comparison of crystallographic and computational variance 

Anisotropic covariance tensors were computed from 90 lowest frequency normal modes from 83 structures, 
as described in Methods, and fidelity of both magnitude and direction prediction was assessed. Magnitude 
prediction quality was measured by linear correlation between isotropic ADPs (B-factors) and the predicted 
isotropic variances over each structure. Two different measures were used for directional agreement, the 
absolute value of the dot product between the largest axes of the anisotropic ADPs (ellipsoids), and the real 
space correlation coefficient, as defined in Methods. These two measures were employed to compare pairs of 
corresponding residues, and the reported numbers are the statistics over all sufficiently anisotropic ellipsoids 
from all 83 structures. Table [T] shows that prediction quality was markedly different for the magnitude 
and direction of motion. All the models had average isotropic correlations of 0.66-0.68, with the exception 
of 0.6f for HCA. On the other hand, there was considerable variation in the directional agreement of ADP 
ellipsoids. The two measures of directional agreement, the dot product and the real space correlation, largely 
showed the same trend, with HCA and ANM displaying relatively weak agreement, while E1N, DNM and 
BNM, show considerably higher prediction quality, with CHARMM-based BNM having an edge over the 
ENM methods. The mean absolute value of the dot product is easy to interpret as a measure of the angle 
between the preferred direction and in the experimental and computed ellipsoids. The average value of 0.65 
for BNM corresponds to an angle of 48°, while the average of 0.56 for ANM corresponds to an angle of 56°, 
but this does not tell the whole story because it only compares one principal axis of the ellipsoids. The real 
space correlation is the volume fraction shared by two ellipsoids of unit volume, and this quantity varies 
appreciably from 0.52 for HCA to 0.6f for BNM. We tested the hypothesis that predictions agree no better 
than expected from a random uniform distribution of ellipsoid direction, for which the mean dot product is 
0.5, and the mean real space correlation is 0.3 (when anisotropy is fixed at 0.5). Almost all of the structures 
with a reasonable sample of usable ADPs (with anisotropy < 0.5) showed better than random agreement 
in real space correlation (P < 0.01, see SI). For the worst-performing method, HCA, 11 structures did not 
meet this criterion, and only four had more than 10 sufficiently anisotropic ADPs, with the highest at 24. 
The results for the best-performing BNM method had only four structures where the null hypothesis could 
not be rejected, all of which had only 5 or fewer usable ADPs. 



Effect of potential on global dynamic ensembles 

Comparison of the normal modes produced by different methods revealed a clear distinction between the 
harmonic ENM models and the CHARMM forcefield BNM. We used the modes computed from all 83 
structures to investigate how the dynamic ensemble predictions depend on the use of the potential. The 
overlap measure described in Methods was computed for the 17 lowest-frequency modes, averaged over the 
entire data set, and plotted for all 10 pairs of methods in Figure [31 The highest agreement was observed for 
the lowest-frequency modes, but the overlap measure dropped below 0.5, depending on the pair of methods, 
at some point in the first 15 modes. This demonstrated that the details of potential play a secondary role at 
lowest-frequency modes, which are dominated by the contact topology and shape of the molecular structure. 
A second observation is the distinctiveness of modes derived from CHARMM-based BNM, which showed 
much lower overlap with ENM-based methods (dotted lines) than overlap among modes from ENM-type 
potentials (solid lines), with the single exception of the overlap between ANM and E1N. Since the latter is an 
all-atom potential, it is reasonable that it should be closer to chemistry-based BNM than to Ca-based ANM. 
We tested the possibility that minimization of structures prior to BNM is responsible for the difference in 
BNM modes, by calculating DNM modes from the minimized structures. The resulting average overlap with 
BNM was 0.76 as opposed to 0.75 for BNM with DNM from unminimized structures, still much lower than 
DNM agreement with other methods. This suggests that the chemical information present in the all-atom 
CHARMM potential plays a role in determining the lowest-frequency modes, in addition to the topology of 
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the structure. 

To illustrate the differences between the chemical forccficld and ENM, we picked a small, well-studied 
structure from the data set, a PDZ2 domain from syntenin (PDB ID 1R6J) and computed the correlation 
matrices (see Methods) from the 90 low-frequency modes of ANM and BNM. Figure 0] shows correlation 
matrices computed from ANM and BNM modes. In general, they look quite similar, with major features 
determined by the secondary structure elements: anti-parallel beta sheets appear as positive bands perpen- 
dicular to the diagonal, and the two helices result in a thickening of the diagonal band. While the pattern 
of secondary structures is clear in both potentials, there are evident differences. First, the magnitude of 
correlation is at least two times weaker in ANM (see the colorbar), and the secondary structure features are 
not as clear, due to the inclusion of residues as far as 16 A away. Second, due to identical force constants 
for distant and proximal interactions, the diagonal band is considerably weaker in the ANM plot than in 
BNM, which has a more realistic representation of covalent bonds and other main-chain interactions. Both 
potentials capture the effect of gross topology, but the effects of specific chemistry are hidden in the fine 
details of the BNM correlation matrix. 



Discussion 

We analyzed five different coarse-grained potentials used to model the conformational flexibility of protein 
structures. These were evaluated both by validation against experimental data and by comparison among 
the different potentials. To our knowledge this is the first systematic attempt to use anisotropic displace- 
ment parameters to validate computational predictions, and it behooves us to note the challenges arising 
from using this data source. The reliability of ADPs has been tested before ( |Merritt, 1999[ ), with good 
agreement in ellipsoid shape observed between independently determined structures of the same protein; 
we found the same to be true for structures of myoglobin in four different crystal forms (Kondrashov, et 
al, unpublished). This shows that ADPs are robust experimental parameters, and to minimize the noise 
contributions we used the highest resolution crystal structures available. However, quantitative comparison 
between ADPs and computational predictions are not straightforward, due to contributions of experimental 
noise, model error flKuriyan et al, 1986D , rigid-body motion of the entire molecules ( |Kuriyan &; Weis, 1991D , 
and specifics of crystal environment, such as crystal contacts between copies of the protein packed in the 
lattice (Phil lips, Jr., 1990] ) and collective lattice modes flClarage et al., 1992| . Further, the ADP represents 
the best fit of a Gaussian distribution to the electron density of an atom, but anharmonic and multimodal 
positional distributions are expected for protein atoms, especially in mobile regions, such as the surface. 
Only atoms with pronounced anisotropy are used for directional comparison, which tend to lie in mobile 
regions with poorer electron density (see Figure [T]), and which are not adequately modeled with a single 
conformer (DePr isto et al., 2004[ ). Thus it is likely that many atoms in our directional dataset are not ad- 
equately modeled by the Gaussian ADP model. Despite these caveats, our results show good agreement 
between the predicted and computed ADPs: for virtually all structures, the real space correlation between 
BNM predictions and ADPs is significantly better than the expectation from a uniform random variable. 
This suggests that the influence of the factors listed above is not sufficient to overwhelm the important 
contribution of intramolecular conformational flexibility. This is consistent with a recent comparison of MD 
simulations with crystallographic B-factors which estimated that rigid-body motions contribute only 20-30% 



of total positional variance in B-factors ( |Meinhold fc Smith, 2005 ). The agreement between computation 



and experiment serves to validate both the interpretation of the experimental data and the reliability of 
computational predictions. 

In our analysis we combined multiple low-frequency normal modes to generate the anisotropic variance 
for each residue from a large number of modes, weighted by the calculated frequencies, and compare the 
result with the crystallographic variation. This method has been used in previous work applying normal 



modes to crystallographic refinement (Kidera & Go, 1990), but is not in common use for validating normal 



modes with experimental displacements. Instead, the procedure is often to project low-frequency modes 
individually onto a conformational change, and to obtain a cumulative projection coefficient. This, however, 
is impossible to do without prior knowledge of the conformational change in the structure, and gives only an 
agreement between the subspace spanned by a several modes and the conformational change. Our approach 
does not presume any knowledge beyond the initial structure, and measures agreement with the entire normal 
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mode ensemble, rather than with individual modes. 

This is also, as far as we know, the first large-scale comparative study of coarse-grained normal mode 
methods. Comparison of the modes from different potentials reveals a distinct split between the ENM 
methods and BNM, as seen in Figure[3] This suggests that the chemical information which is ignored by the 
ENM approach is observable in the BNM results, although there is significant similarity at low frequency 
modes due to the shape of the structure reflected in both potential types. The observation opens up a 
possibility of separating the effect of gross protein structure from that of detailed residue chemistry as 
reflected by the CHARMM forcefield. A careful comparison of ENM predictions with those from normal 
modes with chemical forcefield could potentially be used to determine residues whose chemistry plays a key 
role in the dynamic coupling in the structure, and which would therefore be especially sensitive to mutation. 
The visual comparison of the correlation patterns from ANM and BNM demonstrates that the chemical 
effects are subtle in comparison to the topological features captured by both BNM and ANM, and all the 
other methods. 

The choice of computational strategy to address a given problem involves balancing computational effi- 
ciency against model detail. Fast calculations are meaningless if they give unreliable results, and extremely 
accurate calculations are of no use if they cannot be completed in a reasonable time frame. Normal mode 
analysis is based on a choice to limit the model to the neighborhood of the potential minimum. Further 
simplification of using an elastic network model potential instead of a physical, all-atom potential is another 
concession towards efficient calculation and away from physical reality. BNM prediction quality of the mag- 
nitude of flexibility is similar or worse than those from ENM models. This once again demonstrates the 
robustness of the elastic network models, and suggests that the main factor in determining macromolccu- 
lar flexibility is the number of local contacts, determined by the shape of the molecule ( Halle, 2002[ ). In 
prediction of directionality of motion, there is a clear difference between methods that are based only on 
Ca coordinates, (HCA and ANM) and those that consider all atoms. CHARMM-based BNM has the best 
directional agreement as measured by ellipsoid correlation, while our new method, DNM, and E1N come 
close to matching this standard. This suggests that an all-atom ENM potential can give an accurate rep- 
resentation of the conformational ensemble of a protein near the native state, but the inclusion of chemical 
forces improves the model. 

We must also consider the cost, both computational and human, required by the different methods. One 
of the main differences between elastic network model techniques and BNM is that the latter requires an 
initial minimization step (see Methods). If minimization is not complete, subsequent diagonalization will 
lead to spurious modes with large, negative frequencies; one must be careful to only pick productive modes 
when using results BNM, while clastic network models are at a local minimum by construction. Further, 
the initial setup with an all-atom potential requires attention to the individual oddities of each structure: 
disulfide bonds, non-standard residues, bound ligands or cofactors. Each of these issues must be dealt with 
individually, thus making automation of the calculations more difficult. Compared with ENM models, in 
which most of these details are ignored, CHARMM-based normal modes require a great deal of human effort. 

The present results indicate that anisotropic temperature factors from high resolution crystal structures 
contain a measure of internal molecular flexibility, and can be used as a source of dynamic information and 
as a test for computational methodologies. Comparison of different methods indicates that elastic network 
models can describe the conformational ensemble of protein structures with accuracy approaching that of 
CHARMM, but that there is a substantial spread in prediction quality of different ENM potentials. Using 
an exclusively Ca-based potential results in a large sacrifice in prediction quality of directionality, but the 
lowest frequency modes are robust across the methods. The information may help those studying interactions 
within biological molecules choose the appropriate level of complexity for the system of interest and for the 
level of detail required of the prediction. 



Computational procedures 

We use normal mode analysis ( [Go et a/., 1983| |Brooks fc Karplus, 1983[ |Levitt et al, 1985] ) to predict the 
positional ensemble of protein structures. The different models use distinct potentials, all of which require 
the knowledge of protein structure. The Hessian matrix of the potential is diagonalized to find the normal 
modes, or eigenvectors itj and the corresponding frequencies ajf. Hui — tofui. The decomposition allows us 
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to compute the covariance matrix, which is proportional to the pseudo-inverse of the Hessian. Let Si be the 
deviation from the mean for component i, then the covariance between two deviations is: 



5iSj >= 2fc~T ^ ^2 u ^ u jk (1) 



where Uik is the ith component of the fcth normal mode with frequency cj&. Note that the modes with the 
lowest frequencies make the greatest contribution to residue mobility, so a small fraction of all the modes is 
sufficient to obtain a good approximation of the sum. This allows us to compute anisotropic variances as 
3x3 blocks around the diagonal of the covariance matrix (Kidera fc Go, 1990[). 



We may also compute the correlation coefficient between the deviations of any two atoms, to generate 
the global correlation matrix: 

R(S U <L) = , <5i8 '> = (2) 
^< SiSi >< 8 3 8j > 

Elastic Network Models 



Anisotropic Network Model (ANM) (Atilg an et al, 2001[ ) is a version of Elastic Network Model (ENM), 



based on connecting residues with Ca atoms within a cutoff distance R c with spring-like interactions. The 
Hessian matrix is a 3Nx3N matrix, where N is the number of residues, consisting of 3x3 submatrices Hij 
which depend on the direction of the vector between Ca atoms i and j, and are if the Cas are more than 
R c apart. The diagonal submatrices are defined as follows: Ha = —^2jHij. This defines a coarse-grained 
Elastic Network Model of a protein structure with directional information. We implemented this alrogithm 
using perl code to read PDB files and construct the Hessian, with MATLAB (The Mathworks, Inc., Natick, 
MA) scripts used for diagonalization. 

We introduce two modifications to ANM, analogous to those we had previously proposed for isotropic 
models ( |Kondrashov et al., 2006[ ), and term the new model Distance Network Model (DNM). First, the 
connectivity of the elastic potential is based on distances between nonhydrogen atoms of residue pairs, 
instead of only the Ca atoms. The contacts from all atoms are combined for each residue to yield an 
interaction potential at the residue level. Second, we introduce different classes of residue interactions 
based on interatomic distances, with distinct Hookean spring constants. We use distance bins to define the 
interaction classes, specifically, covalent interactions are found by distance less than 2.3 A, the next shell is 
up to 3.3 A, followed by 5, 7, 9, and II A. The Hessian matrix for each bin is defined exactly as for ANM 
above, with the difference that the equilibrium distance between two atoms has to be in the distance bin, 
while the coordinates (xi,yi,Zi) for residue i remain the Ca coordinates. If H a is the contact matrix for 
class a, the total Hessian matrix for DNM is a linear combination of the matrices, with k a as the interaction 
constant for each class: ^ 

Htotal = k » H * = £ tr{H a ) ^ 

The constants k a define the strength of interactions, and we choose to use the total number of contacts in each 
class as a normalization constant, k a — l/tr(H a ). Thus, DNM adds several different interaction constants, 
but these are defined from the contact matrices, and thus are not free parameters to be optimized. The only 
free parameter, as in other ENM, is the cutoff distance for atomic contacts, which we vary from 5 to 11 A, 
as described in Results. The implementation again used a combination of perl and MATLAB scripts. 

The details of the normal mode analysis implementation of Molecular Modeling ToolKit have been 
described elsewhere (Hinsen, 2000; Hins en et al., 1999] ). For this study, we used the harmonic Ca forcefield 



(HCA) (Hins en et al., 2000 ), which defines different interaction constants for covalently bonded and non- 



covalently bonded Ca atoms. The model uses the reciprocal of distance to weight the harmonic interaction 
constants, and no parameters are varied from the default values. The MMTK calculations for all 83 structures 
in the dataset are carried out in 2 hours on a single 2 GHz AMD Athlon processor with 2 GB of RAM. 

EINcmo (E1N) ( jSuhre fc Sanejouand, 2004] ), is an all-atom ENM, which constructs a contact matrix for 
all atoms within a certain radius, and then treats blocks of one or more residues, as rigid bodies using 
the Rotation- Translation blocking algorithm ( |Tama et al., 2000] ). The two main programs that constitute 
EINemo, pdbmat and diagrtb, were kindly provided by the authors and installed on the local cluster. All 
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blocking is done on a residue by residue basis and the interaction cutoff distance is varied from 4 A to 11 
A. Running E1N on all 83 structures in the dataset using 8 different cutoff distances took roughly 1 day to 
complete on a 100 node cluster of 2.2 GHz Apple G5 processors with 4 GB of RAM. 



Block Normal Modes with CHARMM 

Block normal- mode analysis (BNM), originally suggested by Tama, et. al. ( |Tama et al., 2000] ) and subse- 
quently improved by Li and Cui ( |Li fc Cui, 2002[ ), computes an all-atom Hessian which is then projected 
onto a blocked space spanned by the rotational and vibrational degrees of freedom of predefined blocks; in 
this work each residue is treated as a rotation-translation block, as in EINemo method above. For this level 
of coarse-graining, the procedure reduces the Hessian storage space by approximately a factor of 25 and the 
diagonalization time by a factor of 125. The resulting blocked eigenvectors are then projected back to the 
all-atom space to give all-atom eigenvectors. This procedure perturbs the magnitudes of the eigenvalues, 



but in a linear fashion for the low-frequency modes (Tama et al, 2000 Li & Cui, 2002). The appropriate 



scale factor of 1.7 has been used in this work. Local minimization is performed to ensure that the linear 
term in the Taylor expansion of the potential is zero ( |Hayward, 200l[ ). This minimization is completed 
using cycles of the adapted-basis Newton-Raphson method with gradually decreasing harmonic constraints 
to remove local steric clashes without perturbing the structure significantly. A final minimization with no 
harmonic constraints is performed until the RMS energy gradient reached 0.01 kcal/mol/A. The average 
minimization time for this set is approximately eight minutes, but minimization times vary widely because 
of protein size: 54 seconds for the 52 residue 1RB9, and 73 minutes for the 325 residue 107J, all computed 
on 1.8 GHz Athlon single-processor station with 1 GB of memory, running Red Hat Linux 7.2. In some of 
the systems studied, this level of minimization resulted in modes with large negative frequencies in addition 
to the normal six rotational/translational modes. In these cases, these modes are ignored for all subsequent 
calculations. The average diagonalization time for BNM is approximately six minutes, varying widely again: 
48 seconds for the 1RB9, and 51 minutes for 107J. All calculations are completed using the CHARMM suite 
of programs ( |Brooks et al., 1983fl . The extended atom CHARMM19 force field flNeria et al, 1996[ ) modified 
for use with the EEF1 solvation model {Lazaridis & Karplus, 1999) is used for both minimizations and the 
BNM. 



Measures of agreement with crystallographic data 



The data set is obtained by searching the Protein Data Bank (Berman et al., 2000) for protein structures 
determined by X-ray crystallography to at least 1.0 A resolution, containing at least 50 residues in a single 
chain. Structures with more than 50 % identity are discarded, leaving 98 non-redundant proteins, of which 
87 contained ANISOU cards (anisotropic temperature factors) ; 4 more structures are discarded because they 
contained modified protein residues for which CHARMM libraries are not available. The resultant set is 



structurally diverse, with all major SCOP superfamilies (Murzin et al., 1995) represented, as shown in Table 
1 in Supplemental Materials. All protein chains in the PDB files are kept in the model in order to best 
represent the crystal environment. Copies of the protein molecule surrounding the structure in the crystal 
are generated using the symexp command in PyMOL ( [The PyMOL Molecular Graphics System, 2002[ ), and 
residues with at least one atom less than 4 A from an atom in a crystal copy are considered to be involved 
in crystal contacts. These residues, along with those with Ca atoms with occupancy less than 1, are not 
used in ADP comparisons. 

The anisotropic parameters are 3x3 matrices that define the variance of a 3-dimcnsional Gaussian prob- 
ability distribution for position of each atom: 



p(R) = 



detU~ 
3tt 3 



exp 



U X x 

Ux.ii U', 



\ 



'xy 

u x 



xy 

yy 



yz 



R 



U vz U z 



The six components of ADPs, U xx , etc., are reported in PDB files in ANISOU cards ( jBerman et al., 2000[ ). 
We compare the computationally predicted anisotropic parameters V with those from the crystal structures 
U. Quality prediction of magnitude of motion is measured by linear correlation of the traces of the matrices 
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U and V over the whole structure, which we call isotropic correlation (IC). To compare directions of ellip- 
soids, we first divide all the matrices by their trace, to set all magnitudes to 1. Ellipsoids are described by 
their principal axes (eigenvectors) and the associated lengths (inverse eigenvalues) ; the ratio of the smallest 
to the largest eigenvalue is called its anisotropy ( |Trueblood et al, 1996[ ). We restrict directionality compar- 
ison to ellipsoids with anisotropy of less than 0.5, since directional comparison of near-spherical ellipsoids 
is meaningless. The simplest comparison of directionality is the absolute value of the dot product between 
the major directions. It is a rough estimate of agreement for two ellipsoids whose major axes are dominant, 
but has the virtue of simplicity. A more systematic measure of ellipsoid similarity was proposed by Merritt 
QMerritt, 1999[ ), based on computation of the overlap integral between two probability densities. This mea- 
sure, known to crystallographers as real-space correlation coefficient, is defined for two three-dimensional 
Gaussian distributions with covariance matrices U and V as follows: 

(det^-^de^-))^ 

[l/8det(J7- 1 +V- 1 )} 1/2 

We also compare modes produced by the different normal mode potentials. To compare mode i (as 
ordered by frequency) from two methods, we take the average between the best agreement for mode i from 
method a with modes from method b, and the best agreement for mode i from method b with the modes 
from method a. We compare the modes similar in frequency ordering, specifically, only the modes no more 
than 3 indices higher or lower. The formula for overlap for mode i between method a and b is: 

O a ,b(i) = \ max u°; ■ u b + max u\ ■ u" (5) 

2 i-3<j<*+3 3 2 i-3<j<i+3 3 

If the best agreement is between modes of the same index, then the two maxima are the same. Figures[3]and 
2]were prepared using MATLAB (The Mathworks, Inc., Natick, MA) and figure [1] with rastep and raster3d 
dMerritt fc Bacon, 1997) . 
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Tables 



Table 1: Prediction quality using different NMA potentials 





dot* 


real space* 


isotropic corr" 


random 


0.5 


0.3 





HCA 


0.599 (0.199) 


0.520 (0.167) 


0.617 (0.131) 


ANM 


0.556 (0.190) 


0.525 (0.172) 


0.676 (0.111) 


DNM 


0.650 (0.209) 


0.575 (0.181) 


0.655 (0.136) 


E1N 


0.641 (0.208) 


0.583 (0.184) 


0.680 (0.128) 


BNM 


0.658 (0.211) 


0.608 (0.188) 


0.658 (0.128) 



* mean and standard deviation over all individual ADPs; ' mean and standard deviation over 83 
structures. 
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Figures 




Figure 1: Example of a high-resolution protein structure (PDB ID 1R6J) showing anisotropic 
temperature factors for backbone atoms. The ellipsoids represent 90% probability volume of atomic 
position, with color varying from immobile (blue) to more mobile (red). Most of the backbone 
atoms are not mobile and isotropic, with a few loop residues (residue numbers labeled) showing 
clear directional preference in positional distribution. 
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0.12 




Anisotropy 



Figure 2: Distributions of anisotropy parameters in structures refined with SHELX and Refmac 
software. The large difference is likely due to different default restraints on the anisotropic param- 
eters in the two. 
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Figure 3: Overlap scores for normal modes from different potentials averaged over all 83 structures. 
Each curve is a comparison between a pair of potentials over the 17 lowest frequency modes. The 
solid curves compare different ENM-like potentials, while the dotted curves compare CHARMM- 
based BNM results with those from ENM potentials. 
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Figure 4: Correlation matrices generated from normal mode analyses of PDZ domain (PDB ID 
1R6J). The plots show correlation between residues with indices shown on x and y axes, blue color 
indicating negative correlation and red signifying positive, with the range shown in the colorbars. 
Secondary structure elements are labeled in sequence order. A) Correlation from Anisotropic 
Network Model; B) from CHARMM-based Block Normal Modes. Note that the range in plot A is 
half that of B. 
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